Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition

نویسندگان

  • Kalin Stefanov
  • Jonas Beskow
  • Giampiero Salvi
چکیده

This paper presents a self-supervised method for detecting the active speaker in a multi-person spoken interaction scenario. We argue that this capability is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. Our methods are able to detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Our methods do not rely on external annotations, thus complying with cognitive development. Instead, they use information from the auditory modality to support learning in the visual domain. The methods have been extensively evaluated on a large multi-person face-toface interaction dataset. The results reach an accuracy of 80% on a multi-speaker setting. We believe this system represents an essential component of any artificial cognitive system or robotic platform engaging in social interaction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Acquisition of Definiteness Feature by Persian L2 Learners of English

The definiteness feature in English is both LF and PF interpretable while Persian is a language in which this feature is LF-interpretable but PF-uninterpretable. Hence, there is no overt article or morphological inflection in Persian denoting a definite context. Furthermore, Persian partially encodes specificity not definiteness. In definiteness both the speaker and hearer are involved while in...

متن کامل

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

Vision-based Active Speaker Detection in Multiparty Interactions

This paper presents a supervised learning method for automatic visual detection of the active speaker in multiparty interactions. The presented detectors are built using a multimodal multiparty interaction dataset previously recorded with the purpose to explore patterns in the focus of visual attention of humans. Three different conditions are included: two humans involved in taskbased interact...

متن کامل

Requestive Speech Acts Realization Patterns: Observation from Persian

Without knowing the speech act functions, it would be difficult to make correct requests in a language. Studies in pragmalinguistics have shown that conventionally direct and indirect requestive patterns are perceived differently in different speech communities. This study investigates the perception of the requestive speech acts by Persian native speakers to determine the socially appropriate ...

متن کامل

The Role of Sociolinguistics in Second Language Acquisition

Learning a new language also involves learning a broad system of norms for social relations.This study broadly showed how EFL learners’ speech act is conveyed from their nativecultures when they are communicating in English and demonstrated that there are somepossibilities of cross-cultural misunderstanding when interlocutors are engaged in the speechact of complimenting with native speakers of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.08992  شماره 

صفحات  -

تاریخ انتشار 2017